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I show how to reintroduce velocity dispersion into perturbation theory (PT) calculations of struc- 
ture in the Universe, i.e., how to go beyond the pressureless fluid approximation, starting from first 
principles. This addresses a possible deficiency in uses of PT to compute clustering on the weakly 
non-linear scales that will be critical for probing dark energy. Specifically, 1 show how to derive a 
non-negligible value for the (initially tiny) velocity dispersion of dark matter particles, (Sv 2 ), where 
0^ , Sv is the deviation of particle velocities from the local bulk flow. The calculation is essentially a 

■ renormalization of the homogeneous (zero order) dispersion by fluctuations 1st order in the initial 

power spectrum. For power law power spectra with n > —3, the small-scale fluctuations diverge and 
significant dispersion can be generated from an arbitrarily small starting value - the dispersion level 
(-^ . is set by an equilibrium between fluctuations generating more dispersion and dispersion suppressing 

^ ■ fluctuations. For an n = —1.4 power law normalized to match the present non-linear scale, the 

dispersion would be ~ 100 km s _1 . This n corresponds roughly to the slope on the non-linear scale 
in the real ACDM Universe, but ACDM contains much less initial small-scale power - not enough 
to bootstrap the small starting dispersion up to a significant value within linear theory (viewed 
very broadly, structure formation has actually taken place rather suddenly and recently, in spite of 
the usual "hierarchical" description). The next order PT calculation, which 1 carry out only at an 
order of magnitude level, should drive the dispersion up into balance with the growing structure, 
accounting for small dispersion effects seen recently in simulations. 
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I. INTRODUCTION 



Observing the large-scale density fluctuations in the Universe is one of the best ways we have to approach many 
fundamental questions about the Universe, e.g., understanding inflation, dark matter, dark energy, the curvature of 
the Universe, neutrino masses, possible extra dimensions, modifications of gravity, etc. [e.g.. HI. 13. la. R IH Rj M. IR [qL ITol. 
' E2, EH, 13 EH EH, EH • Statistics of the current density fluctuations can be used to infer statistics of the small initial 
perturbations from which they grew, and in turn to understand the physics of the very early Universe. Measuring the 
evolution of large-scale structure (LSS) over time tells us about the present matter content of the Universe and the 
dynamical rules its evolution follows. Before we can learn anything about fundamental properties of the Universe, 
however, we must be able to compute the dir ectly observable as trop hysical quantities (e.g., the CMB TT], galaxy 
clustering statistics [Hj] , Lya forest absorption [l9|, H(| [52, H2, HI, HJ, galaxy ellipticity correlations used to measure 
0^ ! weak lensing (25|, galaxy cluster/Sunyaev-Zel'dovich effect (SZ) measurements [2g|, and possibly future 21cm surveys 
< ^ ' [13) given a hypothetical underlying model. 

As observational statistics become more and more precise, with the potential to measure more and more subtle 
differences between models, the requirements on the phenomenological theory needed for their interpretation become 
correspondingly more stringent. Currently, linear-order perturbation theory |28l . 12^ . [30L l3l| provides our primary 
means of calculating LSS observables for cosmological parameter estimation and model testing, with only ad hoc, and 
recently demonstrably inadequate, attempts to correct for non-linearity once it becomes non-neglig ible us Emm 
(a vast number of papers have been written on beyond- linear calculations, but most of these are never used in parameter 
estimation papers). Linear theory can robustly describe observations on very large scales or at very early times, but 
breaks down when the perturbations become too large on a given scale. When considering gravitational evolution only 
(i.e., dark matter only), numerical simulations can be used to compute the fully non-linear evolution of the density 
field to high accuracy (with a lot of care and computer power [34|, HE HH), however, as discussed extensively in 
[37J , numerical simulations are a less than ideal tool for interpreting future precision observations, once one considers 
real observables which are inevitably influenced by baryonic effects (e.g ., star formation, and the accompanying 
complication of g eneral g as dynamics). To summarize the argument in [371 ]: Beyond linear order perturbation theory 
(PT) [29|, [H, [33, 53, 52, 52, 53, 53 should provide the primary means of interpreting very high precision future 
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LSS data, just like linear theory has provided the primary means in the past. The range of scales over which higher 
order PT will be necessary (i.e., linear theory is insufficient) and sufficient (i.e., it will be accurate enough after 
computing a modest number of terms) will become larger as the precision of observations increases, while the chances 
of robustly, completely, convincingly describing the full precision of the observations using inevitably somewhat ad hoc 
prescriptions for star formation and related things in simulations becomes more remote. The key difference between 
PT and simulations is the fact that perturbative clustering can be completely described by a finite set of well-defined 
parameters, no matter how complicated the small-scale physics is (at least as conjectured in (37j). while the need for 
fully non-linear calculations implies that there is generally no bound on the number or form of free parameters (more 
or less by definition) . The idea that the importance of PT relative to simulations is increasing with time may seem 
backwards relative to conventional wisdom, however, my argument is that this conventional wisdom was developed 
for the era of not very precise observations, when corrections to linear theory were already too large to be described 
by PT by the time they were large enough to be statistically significant, i.e., past PT work was premature. To be 
clear, I am not saying that simulations will not be extremely useful, only that they are unlikely to be the leading tool 
for extracting fundamental cosmological information from very high precision observations (much like the situation 
in high energy physics, where lattice QCD simulations provide much qualitative, and recently even quite precise 
quantitative, insight 14511 . but high precision constraints on models are made primarily in the regime accessible to 
perturbation theory [4q). 

Following the above line of reasoning, this paper is aimed at building up our ability to do calculations based 
on perturbation theory, ft deals exclusively with the gravity-only part of the calculation, however, this should be 
viewed only as an intermediate goal. PT is most necessary for practical computations of observables which can 
not realistically be done from first principles in a simulation - understanding how to do computations for dark 
matter-only is simply a prerequisite for computin g th ese observables. This continues a recent line of work related 
to renormalization/resummation methods [43, \MAM, US HI HI HI, HI, H3, S HI HI lEO, [H [H . In my 
opinion, the key to maximizing the usefulness of all of this work is eventually connecting it to galaxy biasing and 
related complications [33, [13, EH, [6(| • 

One traditional potential limitation of LSS perturbation theory, which I focus on addressing in this paper, is that 
it assumes the particles at a point in space all have exactly the same velocity [6 71 ] . The equations used are those of a 
pressureless fluid. This is often referred to as the "single-stream approximation" , however, I will avoid this language 
as it assumes a certain picture of large-scale structure formation that may or may not have anything to do with reality. 
I say this because, for typically observed times, stream crossing is actually ubiquitous - in fact, there are many orders 
of magnitude in scale of deeply non-linear structure, to the point where the initial conditions on very small scales 
may be effectively forgotten. Of course, the standard language implicitly means no stream crossing when the field is 
in some sense smoothed on the typical scale where the perturbation theory is supposed to apply. When you look at 
it this way, however, it is clear that, if coldness is a good effective theory, it is not simply because the particles were 
initially cold, there is an additional requirement that the velocity field remains effectively cold on scales just smaller 
than the ones of interest. 

To give a qualitative preview of the paper: The exact equation for the evolution of collisionless particles is the Vlasov 
equation for the full distribution function in 6-dimensional phase space (particle density in position and momentum 
space) . The standard hydrodynamic equations are derived by taking moments of the Vlasov equation with respect to 
momentum - the zeroth moment gives an equation for the evolution of density, the first moment gives an equation 
for the evolution of bulk velocity, and higher moments are normally dropped, e.g., the second moment which would 
describe velocity dispersion. At first glance, one might think that standard PT could be extended by simply adding the 
evolution equation for the 2nd and possibly higher moments, however, that turns out to be less straightforward than 
it sounds. Viewed conventionally, dropping these higher moments is not really an added approximation in standard 
Eulerian perturbation theory. The lowest order evolution equations contain only a Hubble drag term, so any small 
initial velocity dispersion will rapidly become even smaller. Higher order equations contain no source terms, meaning 
that the higher order results are always proportional to the tiny starting value, [68| showed that even perturbation 
theory using the full distribution function directly leads to the same result. I will show, however, that this argument 
for dropping dispersion from the perturbation theory is faulty, in that, while the lowest order terms are very small, the 
series is very rapidly diverging, in the sense that higher order terms are increasing in size instead of decreasing. This 
implies that some rearrangement of the series is needed, as in a resummation or renormalization group calculation. In 
case this mathematical reasoning is not sufficient motivation, recently [67j computed the velocity dispersion generated 
in N-body simulations, finding it to be small but not completely dynamically negligible. 

There has been much discussion in the past about different ways of adding velocity dispersion, or more general 
changes to the small-scale effective theory used for PT calculations [e.g., [H, [7l|, u^, [73|, [zH [zH, however, these 
papers all lacked a first principles derivation, starting from the exact equations, of the model they use. This made their 
usefulness for precision calculations questionable, as there were always added assumptions and/or free parameters. 
The point of this paper is to show how to do a straightforward computation that takes us directly from the initial 
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homogeneous, cold, starting theory to a theory with highly developed, potentially hot, small-scale structure. 

The rest of the paper is as follows: In §11] I show how to use renormalization group- inspired ideas to reintroduce 
velocity dispersion from first principles. In flllll l give a short discussion of the implications of these velocity dispersion 
calculations for redshift-space distortions. Finally, in ^IVI I discuss the results. 



II. GENERATION OF VELOCITY DISPERSION 



In this section I present the calculation that generates velocity dispersion. In ^11 Al I discuss the time evolution 
equations for cold dark matter, which are the starting point for perturbation theory. In ^IIBI I compute the velocity 
dispersion taking the standard PT approach, which leads to negligible dispersion. In mi CI I apply an approximate 
renormalization group approach to models with power law initial power spectra, which demonstrates how significant 
dispersion can be generated. Finally, in ^IIDI I take a related but more exact approach, which should be easily 
extendible to more general, higher order, calculations, including ACDM power spectra. 



A. Evolution equations 

The exact evolution of collisionless particles is is described by the Vlasov equation [2 

df , 1 



with 



p.V/-amVfVp/ = 0, (1) 

or a m 

3 S 

\7 2 <f ) = 4TT G a 2 p5= -Hi Q m , - , (2) 

where /(x, p, r) is the particle density at phase-space position (x, p), m is the particle mass (which plays no role in 
the final results), and p = a mv p (v p here is a particle's peculiar velocity, not to be confused with the mean peculiar 
velocity used everywhere else), x is the comoving position and r = J dt/a is the conformal time, with a = 1/(1 + z) 
the expansion factor. Except when otherwise indicated V = <9/<9x. The density field is obtained by averaging the 
distribution function over momentum: 

p(x, t) ee mo -3 J d 3 p /(x, p, r) , (3) 

and the bulk (mean) velocity and higher moments of the velocity distribution e.g., the dispersion of particle velocities 
around their bulk velocity, can be similarly obtained by multiplying the distribution function by any number of p's 
(e.g., one to obtain bulk velocity) before integrating over p. The mean velocity of the particles at x is 

v(x r) ^ (p/mtt)/d3p (4) 

1 ' Jf(Pp ■ [ > 

and the velocity dispersion tensor is 

o» (x, r) ee K j fd 3p ~ vv3 > ( 5 ) 

i.e., cr y (x, t) ee (6v % Sv^ with Sv l the deviation of a particle's velocity from the local mean velocity, and the average 
is over all particles at x. 

As discussed by [2{J, taking moments of the Vlasov equation with respect to momentum leads to a hierarchy of 
evolution equations for these quantities. The density evolution equation is the usual continuity equation, 

BS 

^ + di[(l + S)v*]=0, (6) 

where <5(x, t) = p(x.,r)/p — 1, dk = d/dx k . The bulk velocity evolution equation acquires a velocity dispersion term 
that is usually dropped to give the Euler equation for standard perturbation theory 
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Hv l + ddfo* = -d i( j> - 3 L \ ; ; i , (7) 
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where Ti = d In ajdr — aH with H the usual Hubble parameter. Multiplying the Vlasov equation by piPj and 
integrating over p gives (after some substitution to remove bulk velocity terms) 

ii k n in ik n i ik n i d k [(1 + $) Q 1 ^] 

-r— + 2H(T J +v k 9 fc cr y + a jk d k v l + a lk d k v 3 = l -— L , (8 

OT 1+0 

where g yfe (x, t) = {Sv l oV Sv k ) p . Similar equations can be derived for q^ k and higher moments. 



B. Standard perturbation theory 



Perturbation theory consists of writing the fields as a series of terms of at least formally increasing order of smallness, 
i.e., 5 = 5\ + 62 + S3 + .... The evolution equations are solved order- by-order, with lower order solutions appearing as 
sources in the higher order equations so that S n is of order 5™ [H[ . 

For simplicity, I will often assume an Einstein-de Sitter (EdS) Universe. This assumption makes analytic calculations 
easier without qualitatively changing the results. We then have useful relations: 



r 

3^ 



tH c , 



n, 2 , 3i# 6 

n = -, and = — 

r 2 a t 



(9) 



Symmetry allows for a zero order (assumed to be homogeneous and isotropic) component of cr 1 - 7 , To( T ) = °o 



<7q = <7g , and in fact we expect there to be primordial velocity dispersion for realistic WIMPs, albeit very small. 
The background dispersion evolves following: 



Tq(t) oc a 2 oc r 

where the normalization must be fixed by the initial conditions. 
The linearized equations are: 



7h~ 



div\ 



= 



-di(j)i - dj<j\ J - T di6i 



8t 



2 H of + T d jV i + T 8i4 







(10) 

(11) 

(12) 
(13) 



where the Poisson equation holds order by order relating 5 to <f>. Here is where the story ends for velocity dispersion 
in standard PT. The CDM velocity dispersion is supposed to be initially small, and only gets smaller. At 1st order 
the evolution equation for a rj contains only the Hubble drag term and terms proportional to the tiny To, so the 
perturbations will remain small. The last term from Eq. ([7]) can be dropped and the hierarchy is closed. I will show 
that this treatment is justified in the approach of standard PT, but retaining the apparently small dispersion terms 
allows for the renormalization in the next subsection. I will drop for simplicity (and because symmetry prevents 
it from acquiring a non-zero background value) . 

In this paper my only goal is to renormalize the zero-order velocity dispersion, To, so I am going to henceforth 
assume the velocity field is curl-free and only solve for V • v and didja 1 ^ . The techniques of this paper could probably 
be used to reactivate the vorticity variables, but symmetry guarantees that they will not have any homogeneous (zero 
order, i.e., background) component (see [67j for a measurement of the vorticity power spectrum in simulations, and an 
argument that vorticity is completely irrelevant for large-scale clustering). The linearized equations, assuming EdS, 
can be re-written as: 



dSi 
dr] 



(14) 



+ g 0i = § - 2 k2f o (Si + 2*0 



(15) 



dn! 9 In To 

— — + 7Ti = 0i 7Ti , 

or] or] 



(16) 
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where 



i] = In a (17) 
ik ■ v 



H 



(18) 



1 fcife, <T y . , 

(19) 



To = ^ (20) 



I have moved to Fourier space, e.g., <5(k, t) = / d 3 x exp(ik • x) <5(x, t), where V — > — ik. 

These equations can't be solved exactly. We expect that the density and velocity modes will be as usual linear theory 
at low k until suppressed below some Jeans-like filtering scale. The basic behavior can be found more quantitatively 
from a small k expansion, where I first drop the k 2 terms and obtain, using Tq = Tidi/a, the usual growing mode 
solutions, Si(k — > 0) = 0\(k — > 0) = n\(k — > 0) = SiCi/cii. I then substitute these solutions into the k 2 terms to obtain, 
to order k 2 , 

Si = 9 1 =tti =Si- fl-?fc 2 T^ . (21) 



a; V 5 

Here I have eliminated the new integration constants at order k 2 by requiring that the 0(k 2 ) corrections are zero at 
some very early initial time a;. With the assumption that the fields are curl free, we can invert the ~n\ solution to 
obtain 

*\ j = 2f ^S 1 (22) 

If I assume the Jeans-like suppression kernel is Gaussian, i.e., exp (— k 2 R 2 F /2^ we have 1 — 6fc 2 Tj/5 = 1 — k 2 R 2 F /2 

so Rf = (l2Ti/5) 1 ^ 2 . Note that I have in this derivation only retained the fasted growing parts of 5i, 6\, and 7Ti, 
which is consistent in standard perturbation theory where the initial time can be taken to be arbitrarily small, but, as 
we will see, is dangerous when we start renormalizing. Finally, note that as long as k 2 Ti is very small, which it will be 
for CDM on observable scales (basically by definition), we have changed nothing by retaining the velocity dispersion 
terms, i.e., the standard perturbation theory with zero velocity dispersion is self-consistent at this order. 
The 2nd order equations are: 



dr 

Qt) 1 ■ ■ ■ — ■ ■ — 

+ Hv\ + v{d jV \ = -d l( t> 2 - d 3 a l i + Sifo&S! - afdjSi - f d z S 2 (24) 



as 

-^+d i vi + 8 l d i v\ + v\d i 8 l = (23) 



B(j lJ 

— 5- + 2 H u l i + f djvi + f divi = -t>{ d k a» - a{ k d k v\ - af d k v{ . (25) 

OT 

For now I am only interested in T 2 = (c^ 1 ) = (cf 2 ) = (cf 3 ) = (& 2 ) /3, which will renormalize To. 

f)T 1 

+ 2 H T 2 + - («* 9 fe crf + 2af d k v\) = . (26) 
Evaluating this in Fourier space, using the same variable redefinitions as above, including T 2 = T 2 7i -2 , gives 

^+f 2 = 2 -fo(^e 1 )^ 2 -f (Sl) , (27) 

where deriving the term involving (7Ti#i) only requires that 9\ and tt\ completely describe the velocities, while the 
term involving (<5 2 ) requires the above approximate relation between Si, 9\, and m- 

Using the right-most side of Eq. (|27p . and assuming standard linear theory for S, the solution is: 

f 2 = if (Si) + C - (28) 
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where c is a constant. If I choose c to make T2(a 
that f = T/H 2 ), I finally obtain 



ai) = 0, and use the zero order solution Tq = A/ a (remembering 



A 1 
T(a) = - + - 
a 3 



4 <«?>-- 



(29) 



The standard PT approach would be to drop the term on the right in the brackets, assuming a » ai. From the point 
of view of standard PT, we could conclude that the result for Ta is small, for ACDM, so dropping velocity dispersion 
is justified (while (5f) is large for ACDM, it is not large enough to overcome the small dispersion and make (S 2 ) T 
large); however, we have a clear breakdown in the premise of the perturbation theory, because T2/T0 ~ (^1) >> !• 



C. Renormalization group approach 

The problem with the standard calculation outlined in mi B[ which leads me to renormalization, is that the 2nd 
term in Eq. (j29)) (T 2 ) diverges relative to the first (To), increasingly as time progresses. One might argue that the 
correction is still small, in an absolute sense, but this is nonsense because there is no reason not to expect the higher 
order terms in the series to be even larger, i.e., the truth could be arbitrarily large. Fortunately, we have tools to deal 
with this kind of breakdown in perturbative solutions of differential equations [z3, [zl, [79|, [8(| • 

A key observation about perturbation theory is that there is inevitable ambiguity in the solution that comes from 
solving a set of differential equations at each order. We obtain a new set of integration constants at each order, while 
the initial conditions only determine one set. This is apparent in Eq. (|29p . where I fixed the 2nd order integration 
constant by the arbitrary requirement that T2(a — a^ = 0. I can instead fix it to set T2(a*) = 0, where a* is some 
arbitrary time, so the solution is 

T(a) = ^ + l^[(S!)-(S 2 (a,))] (30) 

where A+ must depend on a*, in order to match the initial conditions. The final result should not depend on the 
arbitrary a*, so the RG equation is obtained by taking the derivative of T(a) with respect to a*, evaluated at a = a*, 
and setting it equal to zero: 

-j- =0 = —^a- 1 --A*a- 1 -^ (31) 
eta* da 3 oa 



or 

da 3 oa 

Now, (5 2 ) generally depends on A+ through the filtering scale Rp, as discussed above, but supposed for the moment 
that the power spectrum was not truncated so (5\ ) is independent of A+. The solution to the RG equation would be, 
long after the initial time, 

where Ai is the initial A. For a power law with n > —3, (<5^) i s infinite, so we have infinite growth of the velocity 
dispersion, but even in the case of ACDM with asymptotic high-fc slope slightly less than -3 (because the primordial 
slo pe, from inflation, appears to be less than one exp ((6 2 ) /3) will become enormous. Furthermore, as discussed 
in [55| . higher order corrections to the power spectrum generically push the slope to larger values, so the result, 
without filtering, will be even larger when calculated to higher order, to the point of being practically infinite. Infinite 
velocity dispersion is of course too much - the point here is simply to demonstrate how the ridiculously small initial 
seed velocity dispersion in CDM can grow into substantial velocity dispersion later. 

Including the Jeans filtering resulting from the velocity dispersion itself will regulate the growth of the dispersion. 
Recall that the approximate filtering scale is Rf = (12Ti/5) 1 / 2 = (12^4/5aj) 1 / 2 . To be precise, Rf quantifies the 
frozen-in power suppression due to some velocity dispersion present at initial time ai, long after this dispersion has 
redshifted away. When we in effect add new dispersion at time a*, through the RG equation, the smoothing will 
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not be instantaneous; however, it takes place quite quickly, so I will assume I can use Rp = in the 

RG equation, where a ~ 3 is a fudge factor to account for the lag between the introduction of dispersion and the 
smoothing of the power spectrum. As with the original definition for Rf, where I assumed that the smoothing is 
Gaussian, this approximation means the results should only be trusted at the order-of-magnitude level. I will do a 
more exact calculation below. Assuming Gaussian filtering, and power spectrum A 2 (k,a) = A*(a)(fc / 'k p ) 3+n (with 
n > —3), the variance is 



A»r[(3 + n) /2] 
2 [k p R F f +n 



(34) 



Using Eq. 



in Eq. 



I find, well after the initial time, 



3 + nA 2 p (a) T((3 + n)/2) 

3+n 



k p y/ 12 /5aa 



(35) 



Recall that T (a) 
very clear: 



Aa . This result may not be very illuminating, but it can be re- written in a way that makes it 



<*i [Rf (A)}) 



(36) 



i.e., the velocity dispersion simply grows to the point where the Jeans-like filtering reduces the variance to of order 
unity, with the exact relation dependent on the slope of the power spectrum. 



RG method, 2nd iteration 



The calculation leading to Eq. |36J does not look entirely self-consistent. At various stages in it, I used the fact 
that Tb oc a" 1 , however, in the end, the renormalized T is oc a 4 ^ 3+n \ growing quickly rather than decaying. The 
brute force way to solve this problem would have been to compute RG equations for the pieces of the calculation 
that depended on this assumption (e.g., the amplitude of 7r), and solve them jointly, ft is simpler, and will lead to 
self-consistent and enlightening results, to redo the calculation starting with To = Aa a , with a = 4/(3 + n). The 
large-scale solutions for 5 and 6 are unchanged, but we now have 7ri(fc — > 0) = (2 + a)~ 1 Si(k — » 0). The smoothing 
estimated from the k 2 expansion changes for all three fields: 



Si 



1 



(4 + a) 



-k'T a 



(2 + a) a(5 + 2a) 



S(k^0) 



(37) 



1 _2^ fc2fo _(4 + a) 



{2 + a) a(5 + 2a) 



<S(fc-»0) 



(38) 



A: 2 T 



(4- 



a (5 + 2a) 



*(fc-»0) 



(39) 



Note that there is a qualitative difference in these smoothing kernels relative to the a < case in that T appears 
instead of Tj, i.e., because the comoving Jeans scale is increasing when a > 0, the smoothing continues to grow with 
time rather than freezing out. The numerical factors are also different, with 9\ being generally significantly smoother 



than Si, 
Now, 



with tti between them. For example, for n = —1.4. 



R S F 



0.48 f n 1/2 , R% ~ 0.90 f n 1/2 , 



and R F 



0.72 T, 



1/2 



1 A 2 p (a) r [(3 + n) /2] 
^ a 2 (k p Rff +n 



(40) 



where R F 92 = {R F 2 + R e F 2 )/2 and R% and R F are the smoothing scales implied by Eq. J35]) and fgg]). The important 
thing to recognize about a 2 g = (tti9i) here is that it is time independent, i.e., the growth of the power spectrum is 
canceled by the change in R F 62 oc Tq. 
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Now, I have recomputed the linear equations given Tq oc a a , so it remains only to recompute the perturbative 
corrections T 2 . Eq. (J27J) for T 2 is changed, because T is no longer a solution to the zero order equation for T. This is 
a key fact - at the heart of all renormalization is the concept that the naive lowest order result is not always the best 
one to perturb around when doing computations to higher order. Sometimes it is better to perturb around something 
else, with the criteria for "better" being simply that the corrections remain small. The usual renormalization procedure 
of doing the calculation to lowest order, using the results to compute higher order corrections, then absorbing large 
corrections back into the lowest order, is an algorithmic way of accomplishing what one could accomplish by simply 
perturbing around the better starting point from the beginning (if one has a way to determine/guess what it is). 
Here, I am perturbing around To = Aa a , i.e., T2 is defined by T = Aa a + T2 + and the evolution equation I derive 
for f 2 is: 

dfo2~ 

—j-+T 2 = -T (Tr 1 6 1 )-(a + l)To . (41) 

The 2nd term on the right hand side is new. Note that I have not yet fixed A. Normally, A would be set by the initial 
conditions, but there is no need for that because the homogeneous part of the solution for T 2 can be used to match the 
initial conditions (then, since the homogeneous solution is just oc a -1 , the memory of initial conditions will fade away). 
If I choose A to make | (tt\0i) — (a + 1) = 0, which is possible because a^ e depends on A through the smoothing 

kernels, the equation for T 2 becomes trivial, just the homogeneous equation, i.e., there are no perturbative corrections 
to To = Aa a l (That is, for the correct amplitude, and the special value a — 4/(3 + n).) This result should not be 
viewed as some kind of fortuitous coincidence, or artificial tuning. It is what we aim for in a renormalization group 
calculation, and reflects the physical correctness of the idea that the velocity dispersion should be rapidly growing, 
tracking the non-linear scale, with the the unstable cold start completely irrelevant (at least after enough time has 
passed, and for this kind of power law power spectrum). 

We immediately have an equation like Eq. (|36p . except coming from a much more accurate calculation: 



Mi> = |(« + 1) = ^- + | ■ (42) 
I 6 + n 2 

The variance of 5\ is larger, because of the small coefficient of 7Ti, and lesser smoothing of Si, i.e., the more accurate 
version of Eq. ([3^)) is: 



«i*».M(D"- M .^(i±-r' 

e.g., this evaluates to tjg = 55 at n = —1-4, with = 5.2. Keep in mind that this is still essentially the linear theory 
variance. There will be non-linear corrections, however, especially for relatively large n, it seems like a very good 
thing to be starting with finite linear variance rather than the infinite variance of the bare power law. For example, 
for n > — 1 the standard 1-loop PT correction diverges [42j], while here that problem is clearly solved, i.e., we have 
a natural high-Zc cutoff. Choosing the pivot point at &nl, defined by A 2 (/cnl) = 1, these equations can be solved to 
give 

s f T[(3 + n)/2} ^ ( (3 + nfT[(3 + n)/2} \^f3 + n\ 1/2 
*nl**=^— ) ^ 6 (5 + n)(7 + n) ) ) {U) 

or 

f 1/2 _ / (3 + nf r [(3 + n) /2] \ ^ / (5 + n) (23 + 5n) \ 1/2 
NL y 6(5 + n)(7 + n) J \ 2 (3 + n) (4 + n) (6 + n) ) ' ( ' 

~ 1/2 

e.g., /cnlT = 0.12 for n = —1.4. As one might guess, the velocity dispersion scale increases relative to the non-linear 
scale as the power law increases to include more small-scale power. Note that this calculation was self-consistent, 
without any very ugly approximations, although the use of Gaussian smoothing kernels based on the k 2 expansion 
makes the results still only approximate. 



D. A more exact, general approach 



A simpler way approach the split between homogeneous background and perturbations, which I could have 
just started with (except I think the above derivation has some pedagogical value), is to define Uij (t, x) = 
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(Sv l 5vi) p (t, x) - T (r) 5#, with T (r) 5$ = (^(Sv i Sv^) p (r, x)^ . Then it is clear that 

df 1 



and 

da ij 



8t 



^- + 2HT = -^(v k d k a u + 2a lk 8 k v l ) , (46) 



2 U a ij + T (diV j + djv*) = -A [v k d k <y ij + a jk fa* + a ik d k v ] ] (47) 



where A[/] = / — (/) and I have dropped the g ij term. These are the fully non-linear equations, with the need to 
include the right hand side of Eq. (|46|) as the source of homogeneous velocity dispersion, while subtracting the same 
thing from the perturbations, being a simple result of the definition of the mean and perturbations, rather than some 
kind of renormalization. Either way, however, we have the same basic idea that the perturbations feed back on the 
background, new relative to standard PT with just density and velocity. 



1. RG approach at the level of equations rather than solutions 

We still need some perturbative approach to solving the coupled equations for T and the fluctuations in density, 
velocity, and velocity dispersion. The RG approach used above and in [55| is not ideal. First, one needs to solve the 
perturbative equations analytically, which isn't generally possible without approximations that can lead to uncon- 
trolled errors. Second, the RG equation that is derived may not be analytically solvable, as I found in (55|. We can 
solve the first problem (needing analytic solutions for the perturbation evolution), without necessarily making the 
second problem any worse, by considering applying the ideas behind the RG method at the level of equations rather 
than solutions. Suppose we want to solve 

6 = f(S) (48) 

The standard perturbative equations are 

Si = f'Sx , S 2 = f>5 2 + ^-Sl , ... (49) 

etc., where /' = df/dS(S = 0), etc.. Our solution up to 2nd order is 8% + 62, but note that we could solve a different 
pair of equations like this: 

51 = f'Sr + AS , 6* = f'S 2 + ^S* - AS , (50) 

and get the same solution <5i + S2 = S± + <5|. The idea, analogous to the above use of the RG method, is to choose Ad 
to be 2nd order and absorb any undesirable parts of S2 . If we choose AS = 4r<5f + /' (S* — Si) we have 

% = fsi + y s ? > % = - ( 51 ) 

where I have dropped 3rd order terms. Now, if desired, we can set the initial value of S% to zero and forget about it. 
All we have left is the equation for 5$. This looks like nothing more than a very long-winded way of saying "if you 

want to solve S = f(S) to 2nd order, solve the truncated equation S = f'S + ^-S 2 " , and at the level that I will use it 
here, that is really all it is. However, there was potential for more than that within the argument, in that AS could 
have been chosen to be something else if this was convenient to, say, remove a particularly divergent part of S2 while 
still producing an easy to solve equation for 5*. Also, 5| could in principle have been given some part of the initial 
conditions. I present this mainly as an explanation of the connection between the RG method and the method of 
simply numerically solving truncated equations. The difference between solving these truncated non-linear equations 
and the original perturbation equations is that all of the feedback between 2nd and 1st order is included, rather than 
leaving the first order solution fixed while computing the 2nd order solution. 

As suggested in [5|| , and implemented in [|32j , one approach to computing the power spectrum of the fields is to 
use the time evolution equations for the fields to derive time evolution equations for the power spectrum, and to 
solve those numerically. Unfortunately, these equations involve the bispectrum, which must then be solved for, and 
the bispectrum evolution equations involve the trispectrum, etc., i.e., one has an infinite hierarchy of equations. The 
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method of truncating this hierarchy by setting the connected trispectrum to zero is equivalent to applying the above 
reasoning about truncating the series of perturbative equations, i.e., at linear order in the initial power spectrum 
the perturbative equations only contain the power spectrum, at 2nd order they only contain the power spectrum 
and bispectrum, while the trispectrum is needed at 3rd order. This type of approach seems like the only option for 
including the velocity dispersion as discussed in this paper in high precision calculations, because one cannot solve the 
evolution equations analytically when including the velocity dispersion. One has to recognize that standard PT could 
not even produce the analytic results that it does (for time evolution - there are generally numerical integrals over 
k) without the special fact that, to a very good approximation, one can use the solution to the evolution equations 
in an Einstein-de Sitter Universe in a realistic Universe as well - this is a very fragile situation and any added 
complication tends to produce un-solvable equations (e.g., non-negligible massive neutrinos are a good example of 
this, where, additionally, equations including velocity dispersion would naturally appear explicitly (5ll. l8ll]). and this 
approximation can never describe the kind of dependence on the equation of state of dark energy found by [82| . 

In anticipation of eventually incorporating the velocity dispersion into a scheme for evolving the power spectrum 
and higher order statistical equations (the alternative method of closing the hierarchy in [53| may also be a useful 
way to do this), I write here the evolution equations at 1st order in the power spectrum. Without the new velocity 
dispersion parts, these would just be the equations for the standard linear theory power spectrum. (I am still assuming 



an EdS Universe, although that is not necessary here.) 

P' ss (k) = 2P se (k) (52) 

He ( fc ) = l p ss (k) - ^Pse (k) + Pee (*) - k 2 f [P ss (fc) + 2P Sn (fc)] (53) 

P' eg (k) = W se (k) - P ee (k) - 2k 2 f [P se (k) + 2P e ^ (k)} (54) 

PL (k) = P se (k) + Per (k) ^ j P S „ (k) (55) 

PL (k) = Pee (k) + ^P s „ (fc) - f | + P e « (k) - k 2 f [P fa (k) + 2P n7V (k)] (56) 

P^ (k) = 2P 9 , (k) -2(1 + P W7r (fc) (57) 

2-1 f°° 

T' + T=-T — dq q 2 P B , (q) (58) 
3 2vr^ J 



Intuitive understanding of this last term is that it measures the tendency for velocities to converge on overheated 
places, and vice versa (remember that, for the definitions we're using, positive 9 and ir mean, respectively, convergence 
and, in a one-dimensional sense at least, negative 2nd derivative of the dispersion). Note that, for this paper, writing 
all of these equations involves some redundancy, in that one could just solve Eqs. (|14H16p to get growth factors as 
functions of k, using them to evaluate Eq. (|58p . When one goes to higher order, however, where <5, 9, and 7r are no 
longer perfectly correlated, one will need all the equations. 

2. Power law power spectra 

Before I solve these equations completely numerically, it is informative to assume a power law power spectrum, 
for which we can derive some scalings. Defining /cnl by Aj^fcNL) = 1, and using time evolution A^(fc) oc D 2 , I find 
/cnl D~ 2 /( 3+n \ The velocity dispersion must follow 



f oc k^? oc 



(59) 
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FIG. 1: Ratio of linear power with Jeans-like suppression by velocity dispersion to without (i.e., to the T — » limit). The 
left panel is for a power law power spectrum with n = —1.4, while the right is for n = —2.75. Lines are identified in the figure 
panel, a = f 1/2 . 



Inevitably, this is the same evolution that the RG calculation above predicted. For this to be consistent with Eq. 
([55]). we must have 

(On) = -£- + !, (60) 
6 + n 2 

i.e., as structure grows on large scales, the Jeans- like effect of velocity dispersion must erase enough small-scale 
structure to maintain fixed (On), with the fixed value being larger for power spectra where there is less small-scale 
power. Again, this result is identical to the RG result. This applies only for n > — 3 of course - for n < —3 this 
discussion breaks down right at the beginning, with the definition of We can also solve analytically for the 

growing mode amplitude of n in the k — > limit, which is suppressed relative to 5 and 9 because of the dhiT/drj 
factor (this came from the fact that tt fluctuations are measured relative to the changing background dispersion). The 
result, again the same as in the RG calculation, is 

7r(fc^0) = i|±^5(fc^0) , (61) 

i.e., the dispersion fluctuations are smaller for models with less small-scale power, although the best way of looking 
at this is probably to think of these models as the ones where the non-linear scale is changing more quickly, and 
therefor the homogeneous dispersion is increasing more quickly, diluting the perturbations. All that remains to be 
calculated (for a power law power spectrum) are the high-/c suppression kernels that must be applied to each field, 
which generally must be obtained numerically. 

Figure Q] shows these kernels, i.e., the effect of the Jeans-like smoothing due to the mean velocity dispersion, for 
power law power spectra with n = —1.4, corresponding to roughly the present non-linear scale, and —2.75, appropriate 
to the non-linear scale at a much earlier time. To be clear, Fig. Q] was made by evolving Eqs. (|52ti58|) numerically, 
with initial conditions for a scaling solution determined by first evolving assuming the scaling solution for T, then 
restarting using the results for the initial kernels and normalization (at that point, the system will evolve stably 
on the scaling solution). Generally, the initial conditions would break the scaling, e.g., if one starts with the bare 
un-smoothed power law, that is not consistent with scaling (one would think that non-linear effects would erase the 
memory of a break in scaling in the distant past, but that does not happen at linear order). In each case Fig. [1] shows 
the power spectrum from the numerical calculation divided by the power spectrum that would be obtained by taking 
the T — > limit (specifically, taking the overall normalization of T to zero, which does not affect dhiT/dr] which sets 
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the normalization of it). The kernels for the different power laws are dramatically different when plotted as functions 
of ka (a = T 1 / 2 ). Additionally, the relation between the non-linear scale and a is much different: a = 0.19fc^ for 
n = —1.4, while a = 0.000023/^^ for n — —2.75, i.e., the dispersion scale lags much farther behind the non-linear 
scale for the spectrum with less small-scale power, not surprisingly. Note that the non-linear and dispersion scales 
are increasing extremely rapidly for n = —2.75, like a 8 , so that the dispersion scale only lags the non- linear scale by 
a factor of ~ 1/4 in expansion factor. Similarly, even though the extreme-looking spike in the kernel covers a factor 
~ 2 in fc, a given mode only spends ~ 10% of an expansion factor within this feature. In spite of the apparently 
large differences, there is a remarkable relation between the two power law cases: the expansion factor by which the 
dispersion scale lags the non-linear scale, i.e., (a /cnl)' 3 "*"™^ 2 > is essentially identical between the two cases - 0.262 
vs. 0.265 from my numerical solution, for n = —1.4 and n — —2.75. To be clear, the dispersion in each case is 
determined by the requirement that Eq. (|60p for (8tt) is satisfied after the power is filtered by the complex-looking 
kernels in Fig. [T] Eq. ([43)1 actually does a reasonable job anticipating this relation, in spite of its approximations, 
predicting 0.18 and 0.12 for n = —1.4 and n — —2.75, respectively, but it is not clear if there is any deep reason 
for the near perfect agreement in the exact calculation (the under-prediction by Eq. [45] is understandable, as the 
Gaussian approximation I used there for the kernels works essentially perfectly for the initial suppression, but misses 
the wiggles, which produce extra dispersion). 

3. ACDM 

The situation for a ACDM power spectrum is less straightforward than for power law power spectra, where we could 
imagine the initial dispersion was arbitrarily small, yet still obtain an interesting dispersion tied to the non-linear scale. 
In the real Universe, we have specific physically motivated initial conditions, and we cannot ignore them (at least not 
within the lowest order calculation in this paper - as I discuss below, I actually expect that the influence of the smallest 
scale initial conditions will be erased when the calculation is taken to the next order). A typical WIMP with mass 
m ~ 100 GeV decouples thermally from the relativistic electrons, positrons, and neutrinos at Td ~ 20 MeV 83]. The 
coupling to the relativistic plasma actually produces acoustic oscillations in the transfer function, which I will ignore. 

1 /2 

The ID velocity dispersion at kinetic decoupling is v]p = (§ 5*) c. For a typical WIMP, the scale below which power 
is erased is of order the scale of free-streaming after decoupling, but would actually be similar even for much more 
massive (i.e., slow moving) particles, because a substantial fraction of the effect is due to Silk-damping- like friction 
between the WIMPs and leptons during decoupling [83| . Using Eq. (48) of [83| , I will assume the power entering the 

1 /2 

matter dominated era is suppressed by exp(— k 2 /k 2 s ) with kj^ ~ (f Td In (T eq /l.05rd) where Td is the conformal 

time at decoupling and T eq is the conformal time at matter-radiation equality (after which the free streaming has 
frozen). For the parameters I am using, the primordial velocity dispersion, in Hubble flow distance units, extrapolated 
to the present time assuming no enhancement by structure formation, would be T 1 / 2 = 8.6 x 10~ 10 h~ x Mpc, and the 
smoothing scale is kj^ = 10~ 6 hr 1 Mpc. 

The total density variance in a realistic ACDM model, in linear theory, without any enhancement of velocity 
dispersion by structure formation, is ((5 2 ) ~ 517Z? 2 , where D is the linear theory growth factor normalized to be 1 
at the present time. If we could assume that <jg v — {6mi) followed the same form, as it does before there is any 
feedback (and assume an EdS Universe), the solution to Eq. (|58p would be T/Ti — exp [517 (a 2 — a 2 ) /3 — In (a/a^)] . 
We see that if (dmi), which starts out equal to (^i), remains anywhere near as large, the current dispersion would 
be extremely large, i.e., T/Ti ~ exp(517/3) ~ 10 75 . Clearly this is too large, which is primarily explained by the fact 
that, once the velocity dispersion begins to increase, (Oxtti) falls well below (<5 2 ), as we saw for power laws in Eq. (|61[) . 
We can estimate the size of this effect by continuing to ignore any additional smoothing by the extra dispersion, but 
solving Eqs. (j6"0|) and (f6"Tj) to find an effective n that will allow both equations to be satisfied, given the large density 

variance (<5 2 ). The result is (9mi) ~ (§ (Sf)) 1/2 , or, for (S 2 ) = 517L> 2 , n = -2.77 (at D = 1) and (9mi) = 27.1L>. 
Using this we can solve Eq. (]58fl to obtain T/Ti ~ exp[(2/3)27] ~ 10 8 . This may seem like a big number, but it is 
in fact inadequate to produce a dispersion that is dynamically significant at late times, i.e., we need a boost by ~ 9 
orders of magnitude in T 1 / 2 , not just 4. A full numerical evaluation of the evolution equations for T and the power 
spectra basically confirms this analysis. T 1 / 2 = 6x 10~ 6 h~ l Mpc at the present time, ~ 4 orders of magnitude larger 
than the primordial dispersion but nowhere near macroscopic, and not even large enough to begin suppressing the 
power much beyond the primordial kj^ = 10~ 6 /i -1 Mpc. The full calculation gives (d 2 ) = 506 and (9 mi) = 26.4, 
close to the no-smoothing approximation. 

Now, the derivation of Eq. (|58p did not assume linear theory, only that velocities are irrotational, and that we 
can drop higher moments of the velocity distribution function, so it would be correct to use a non-linear version of 
(8tt) to evaluate it. For example, the standard 1-loop PT power spectrum has (<5 2 ) > 10 5 , easily large enough to 
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completely change the picture for the growth of dispersion, i.e., the feedback between the production of dispersion 
by non-linearity and the smoothing of the perturbations by dispersion will be critical. Exactly how this works out 
is impossible to say at this point. We don't have higher order PT calculations that include this form of velocity 
dispersion, and, as I found above even at linear order, it is almost surely impossible to derive analytic PT results with 
dispersion. The next step appears to be to set up and solve numerically the non-linear versions of Eqs. (|52H57[) . and 
the inevitably accompanying bispectrum equations (at least). It seems very likely that this will produce the level of 
dispersion necessary to produce the percent level corrections to the power spectrum at k ~ 0.1 ftMpc -1 found by [6^ |. 
This requires a dispersion level corresponding to ~ 1 h Mpc. For non-linear scale and n ~ —1.4, the result I 
found for a power law power spectrum was just this ~ 1 h^ 1 Mpc. This would also be consistent with the scale at 
which the RG evolution in [55| saturated to it's fixed-point slope of n = —1.4. 



III. REDSHIFT-SPACE DISTORTIONS 



Interpretation of galaxy redshift surveys generally req uires us to account for the distortion of the observed clustering 
due to Doppler shifts caused by peculiar velocities [lo, H3, HE HE H3] , including, potentially, those due to velocity 
dispersion as well as bulk flows. Here I will continue to discuss dark matter only - the differences between galaxies 
and dark matter, while certainly important, are the subject for another paper. 

Starting from the distribution function, defined above, it is straightforward to derive the perturbative redshift- 
space distortions including possibly velocity dispersion. Defining the redshift-spacc coordinate for a particle, s = 
x + f pn/amTi where f is the unit vector pointing along the observer's line of sight and p« = p • r, the density in 
redshift space is: 

Ps (s) - m a- 3 J d 3 x d 3 p / (x, p) 5 D (s - x - f -£L_) = m a~ 3 J d 3 p f (s - r-£L_, p) . (62) 
I now expand the distribution function in a Taylor series in pn/arriH, 

* (., - ,n / < s ,p> + m «-/ (- J^L) + 1™ jtiP™ (J^f + ... ,63) 

Taking the derivatives outside of the p integrals, and using the above relations between moments of the distribution 
function with respect to momentum and bulk velocity and velocity dispersion, I obtain 

5s (s) = 5 (s) - I A [(1 + 5) «„] + [(1 + 5) («f + a u )] + ... (64) 

The linearized version of this equation, with zero-order velocity dispersion = TL 2 Tq, is 

The first two terms are the usual linear theory result , while the last two are related to the inclusion of dispersion. 
In Fourier space, and using v l = —iHk l k~ 2 9 and = 2H 2 Tokikjk~ 2 ir (for irrotational velocities), this is 

5s (k) =(l- h 2 f fi 2 ^j 5 (k) + p 2 6 (k) - k 2 f ^ vr (k) . (66) 

where I have not used the usual 6 = f5 = d ^ n D a 5 simplification because the relation between 9 and 5 is complicated by 
the presence of dispersion (their Jeans smoothing kernels are different - note that, like [67[, I found in Fig. [T]that the 
velocity divergence is suppressed substantially more than the density fluctuations, which means that these corrections 
will be more important for redshift space distortions than they would be in real space). Similarly, 7r will only be 
proportional, not equal, to 5 on large scales (usually smaller), and will have a different smoothing kernel. The first 
dispersion term looks like the first term in a small-fc expansion of the smoothing kernels that have been used to model 
non-linear peculiar velocities in the past 88] . There will clearly be a series of similar higher order terms that could 
be re-summed into a simple smoothing kernel. Note, however, that there will also be a series of terms involving the 
homogeneous kurtosis of the velocity distribution, which generally could be of similar order to the re-summed terms, 
so it isn't obvious that the re-summation will be particularly useful (i.e., once the re-summed terms are needed, other 



14 



independent terms may be needed as well). Also, note that in cases where the velocity dispersion is relatively small, 
the similar distortion coming from the v 2 term in Eq. (|64[) could easily dominate over this term. 

Clearly, redshift-space distortions will require further consideration as the higher order program suggested by this 
paper is carried out. The suppression of the new terms by fc 2 T) probably means that they will not be significant 
unless terms higher order in S are also significant on the same scale, i.e., fc 2 Tb probably should be considered to be a 
new expansion parameter, at least as small as 6, and probably 5 2 (this is consistent with the way that To is derived, 
as proportional to the linear power spectrum, however, the correspondence is not unambiguous because of the very 
different origin and form of growth of the terms). In any case, interpretation of very high precision observations, on 
the imperfectly linear scales where most of the information resides, will clearly require a more subtle calculation than 
has so far been done, however, one should not despair, or flee to relying entirely on numerical simulations, because 
an understanding of the detailed redshift-space structure of observations will ultimately make our measurements of 
fundamental physics much more robust and believable. Note that nothing requires, or even really suggests, that the 
effective To for galaxies [8{| [§3, UHL [§2, [93| must be equal to that for dark matter - in the end, it will probably be a 
completely free parameter for galaxies, like bias. 

IV. DISCUSSION 

The deficiencies of standard perturbation theory are a lack of control of the higher moments of the velocity dis- 
tribution function, beyond the pressureless fluid approximation, problems with accuracy related to the fact that PT 
integrals include small scales that are generally highly non-linear, even when the scales we are interested in are weakly 
non-linear, and, of course, simply not working on small scales where the fluctuations are large. The deficiencies of 
numerical simulations are speed, and the related fact that statistics (e.g., the power spectrum) cannot be computed 
directly, but only as averages over realizations of the random density field, which must contain many orders of mag- 
nitude more degrees of freedom (e.g., a billion) than one really needs to describe the statistic of interest (e.g., a few 
dozen for the power spectrum). Additionally, and probably most importantly, the cumbersome, opaque nature of 
simulation results is greatly exacerbated when they are used to model galaxies or other observables instead of just 
dark matter, while the advantage of being more or less exact for gravity alone (at least in the straightforward limits 
of large box size and particle density) is lost. 

This paper directly addresses the PT deficiency of missing higher moments of the velocity distribution function, 
showing how they can be re-activated and generated at a significant level, starting from first principles. The approach 
here may also improve PT by providing natural regulation of small-scale structure, i.e., the Jeans smoothing that 
arises here has the same kind of effect as the propagator resummation of [13, EH . The philosophy of this paper is 
that the cold "streams" often discussed as "crossing" are mythical objects - what one sees in the real Universe is 
always some evolution of effectively warm (although maybe not very warm) material. The meaning of this idea is 
very clear for power law power spectra, where, as we saw, non-trivial effects of velocity dispersion can be computed 
without any initial dispersion entering the discussion. The dispersion bootstraps itself up from an arbitrarily small 
start. The temperature is locked into a sort of equilibrium with the growth of structure. The situation is not as clear 
for ACDM power spectra, not so much because the ACDM is cold as because there is very little small-scale power 
in these models, so the dispersion computed to linear order in the power spectrum remains well below the non-linear 
scale (although orders of magnitude larger than it would be if there was no structure). This situation will change when 
calculations are done to higher order, where I showed that there is enough small-scale power generated to produce 
dispersion that would be far too large in absence of feedback on the structure formation itself. The initial velocity 
dispersion should be rendered irrelevant when the system moves into a sort of self-regulating mode, like the power 
law example, where the velocity dispersion and growth of structure are tightly coupled by feedback between them. 
Ultimately, it seems likely that the best effective small-scale model for doing perturbation theory will involve some 
more general balance between different pieces, i.e., density, velocity divergence, dispersion, and maybe even vorticity, 
etc., because we know that physically this is what the small-scale structure really is, i.e., halos which can only be 
described as a delicate balance of the original perturbative LSS quantities. To put this another way: hopefully, when 
all of the relevant elements are included, there will be some fixed-point structure for small scales with a clear physical 
interpretation and effectively far fewer than the original number of degrees of freedom (akin to the fixed-point power 
law found in [55| . but with richer structure). 

This paper does not exactly contain useful quantitative take-home results. The results are primarily a procedure 
to follow for future calculations. If there is a single equation that best represents the results, it is probably Eq. (|45[) . 
which shows how, for a power law power spectrum, the velocity dispersion tracks the non-linear scale, with Jeans 
filtering erasing more and more small-scale structure as the larger scale structure grows. Eqs. (j4l)|) and (T55|) are also 
critical, in that they show generally how homogeneous velocity dispersion is generated out of fluctuations. The key 
new variable, equivalent to 8 and 6*, but for perturbations in velocity dispersion, is 7r cx didjU 1 ^ . 
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The next step in this line of work is to derive to the next-order equations like Eq. (|52II57[) (but including bispectrum 
equations), which are needed to have any chance of properly describing ACDM. Then the results can be tested by 
comparison to numerical simulations. While ACDM is of course the ultimate goal, tests of the theoretical concepts 
here might be more conveniently done with power law simulations, particularly ones with relatively blue spectra, as 
in [54j . Any of the renormalization approaches that have been developed recently can probably be applied to the 
problem of renormalizing the velocity dispersion, at least in principle, i.e., resumming a set of terms that generates the 
dispersion, either explicitly or through a renormalization group equation. The obstacle to doing this elegantly may 
be the difficulty of obtaining analytic solutions as the evolution equations become more complicated (this problem of 
requiring analytic solutions is, I think, the primary reason to favor the "numerical evaluation of evolution equations 
for statistics" approach advocated in this paper over other, more completely analytic, recent approaches). 

One might ask "why stop with the 2nd moment of the distribution function, i.e., why can we drop from Eq. 
([5])?" One hope, which will need to be verified by future calculations, is that the effect of increasingly high moments 
on large scales may take the form of a gradient expansion, i.e., in Fourier space a series where the effects of increasingly 
high moments enter multiplied by increasing powers of k. In this case, we would expect that, on scales where the 
effect of the 2nd moment are already small, the effects of higher moments will be even smaller. 

The result for redshift-space space distortions (Eq. leads to the question: Why do we use v, and in this 
paper er^ , as the variables to be solved for perturbatively, rather than, e.g., momentum (1 + 6) v and kinetic energy 
(1 + S) (viVj + <Tij)7 An answer one might have considered was that velocities are needed to compute redshift-space 
distortions, but here we see that the most direct quantities for that are in fact momentum and energy, with additional 
perturbative calculations needed when starting from S, v, and cy. A change to total kinetic energy would have 
the potentially substantial effect of increasing the Jeans-like smoothing of the linear power, because the zero order 
pressure would be larger. I do not see any clear a priori reason to favor one option over another - they are just 
different ways of arranging a series of terms, which should be equivalent if one could include an infinite number of 
terms. Note that, while the choice of variables is optional, the renormalization of the energy-related variable is not 
optional - it simply makes no sense to ignore the fact that the size of terms in the series describing one of your 
variables is increasing rapidly, rather than decreasing, with order. There is a lot of circumstantial evidence that using 
total kinetic energy could be useful. The renormalization/resummation of the propagator in |4?| leads to filtering 
much like the Jeans filtering we find due to velocity dispersion, but with scale given by the bulk velocity power 
spectrum. The Lagrangian PT-based scheme of [49j leads to a similar result. Another possibility to consider would 
be the evolution of large-scale fields with the small-scale structure explicitly averaged out, which would naturally lead 
to the inclusion of small-scale bulk velocities in the dispersion term [H, [95|, [96| . One might also consider using the 
Schrodinger equation representation of the exact Vlasov equation, proposed in (97j . combined with the approach of 
this paper. 

In the end, one could view the approach in this paper as a re-activation of the program under discussion in papers 
like [98[, which set out to integrate the BBGKY equations numerically. This reconsideration is timely because of 
several developments over the last thirty years. We now know the appropriate class of models to focus on, especially 
including the appropriate the initial conditions. There has been a lot of work on both perturbation theory beyond 
linear order, and on N-body simulations, with the limitations of each teaching us a lot about what is needed from 
new methods. We also have a specific ca l culation to focus o n: the scales where baryonic acoustic oscillations (BAO) 
are observable pHl l99l Hool . Hoil . Il02l . IToH . Hoi . H05l . HoH [l07l ] . which points us toward the perturbative approach that 
motivates truncating the hierarchy and believing that high precision can be achieved (in contrast to [9a ] . who were 
focused on the more strongly non-linear regime, where the truncation used here is not well- motivated). The same 
scales are also potentially the most powerful for other measurements based on, e.g., redshift-space distortions [jj. 

The basic form of calculation I do here is completely standard in some other areas of cosmology. For example, 
the evolution of the homogeneous (background) value of an interacting scalar field in the early Universe is affected 
by quantum and thermal fluctuations. Its evolution is not described by the original equation of motion with all 
perturbations ignored, but by a renormalized effective potential, which is at least formally infinitely different from 
the original potential, i.e., completely dominated by the part due to fluctuatio ns 110811. Another intere stingly similar 
calculation is the development of equilibrium after preheating after inflation. |l09l . Illd , lllll Ill2l . 1 1 1 3j ] perform fully 
non-linear simulations of the interaction of scalar fields, much like large-scale structure simulations, with the added 
twist that the background density and pressure are affected by the fluctuations. The evolution of the scale factor is 
calculated by taking spatial averages over the fluctuations as the simulation is running. In the beginning, the nearly 
homogeneous inflaton field dominates, and a very naive calculation might compute the expansion of the simulation 
in advance assuming homogeneous evolution, but by the end of the simulation the homogeneous component has 
actually practically disappeared, with the background evolution completely dominated by the fluctuations, which 
behave like radiation. Of course no one would ever do the very naive calculation just mentioned, where the affect of 
the fluctuations on the background equation of state is ignored... except that this is what we do when we assume 
that the tiny initial temperature of CDM means that it will forever remain cold. 
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Substantial work remains to determine if the approach in this paper enhances the accuracy of predictions of quasi- 
linear clustering in the real Universe; however, it is now possible to consider an effect that was previously outside of 
any first-principles computational control in this kind of perturbation theory. The bottom line of this paper is that 
even linear theory fluctuations necessarily imply a significant one-loop renormalization of the background velocity 
dispersion. 
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